5 research outputs found
Fine-To-Coarse Global Registration of RGB-D Scans
RGB-D scanning of indoor environments is important for many applications,
including real estate, interior design, and virtual reality. However, it is
still challenging to register RGB-D images from a hand-held camera over a long
video sequence into a globally consistent 3D model. Current methods often can
lose tracking or drift and thus fail to reconstruct salient structures in large
environments (e.g., parallel walls in different rooms). To address this
problem, we propose a "fine-to-coarse" global registration algorithm that
leverages robust registrations at finer scales to seed detection and
enforcement of new correspondence and structural constraints at coarser scales.
To test global registration algorithms, we provide a benchmark with 10,401
manually-clicked point correspondences in 25 scenes from the SUN3D dataset.
During experiments with this benchmark, we find that our fine-to-coarse
algorithm registers long RGB-D sequences better than previous methods
Rescan: Inductive Instance Segmentation for Indoor RGBD Scans
In depth-sensing applications ranging from home robotics to AR/VR, it will be
common to acquire 3D scans of interior spaces repeatedly at sparse time
intervals (e.g., as part of regular daily use). We propose an algorithm that
analyzes these "rescans" to infer a temporal model of a scene with semantic
instance information. Our algorithm operates inductively by using the temporal
model resulting from past observations to infer an instance segmentation of a
new scan, which is then used to update the temporal model. The model contains
object instance associations across time and thus can be used to track
individual objects, even though there are only sparse observations. During
experiments with a new benchmark for the new task, our algorithm outperforms
alternate approaches based on state-of-the-art networks for semantic instance
segmentation.Comment: IEEE International Conference on Computer Vision 201
Matterport3D: Learning from RGB-D Data in Indoor Environments
Access to large, diverse RGB-D datasets is critical for training RGB-D scene
understanding algorithms. However, existing datasets still cover only a limited
number of views or a restricted scale of spaces. In this paper, we introduce
Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views
from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided
with surface reconstructions, camera poses, and 2D and 3D semantic
segmentations. The precise global alignment and comprehensive, diverse
panoramic set of views over entire buildings enable a variety of supervised and
self-supervised computer vision tasks, including keypoint matching, view
overlap prediction, normal prediction from color, semantic segmentation, and
region classification
RGBD Pipeline for Indoor Scene Reconstruction and Understanding
In this work, we consider the problem of reconstructing a 3D model from a sequence of color and depth frames. Generating such a model has many important applications, ranging from the entertainment industry to real estate. However, transforming the RGBD frames into high-quality 3D models is a challenging problem, especially if additional semantic information is required. In this document, we introduce three projects, which implement various stages of a robust RGBD processing pipeline.
First, we consider the challenges arising during the RGBD data capture process. While the depth cameras are providing dense, per-pixel depth measurements, there is a non-trivial error associated with the resulting data. We discuss the depth generation problem and propose an error reduction technique based on estimating an image-space undistortion field. We describe the capture process of the data required for the generation of such an undistortion field. We showcase how correcting the depth measurements improves the reconstruction quality.
Second, we address the problem of registering RGBD frames over a long video sequence into a globally consistent 3D model. We propose a ``fine-to-coarse'' global registration algorithm that leverages robust registrations at finer scales to seed detection and enforcement of geometrical constraints, modeled as planar structures, at coarser scales. To test global registration algorithms, we provide a benchmark with 10,401 manually-clicked point correspondences in 25 scenes from the SUN3D dataset. We find that our fine-to-coarse algorithm registers long RGBD sequences better than previous methods.
Last, we show how repeated scans of the same space can be used to establish associations between the different observations. Specifically, we consider a situation where 3D scans are acquired repeatedly at sparse time intervals. We develop an algorithm that analyzes these “rescans” and builds a temporal model of a scene with semantic instance information. The proposed algorithm operates inductively by using a temporal model resulting from past observations to infer instance segmentation of a new scan. The temporal model is continuously updated to reflect the changes that occur in the scene over time, providing object associations across time. The algorithm outperforms alternate approaches based on state-of-the-art networks for semantic instance segmentation
ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes
A key requirement for leveraging supervised deep learning methods is the
availability of large, labeled datasets. Unfortunately, in the context of RGB-D
scene understanding, very little data is available -- current datasets cover a
small range of scene views and have limited semantic annotations. To address
this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views
in 1513 scenes annotated with 3D camera poses, surface reconstructions, and
semantic segmentations. To collect this data, we designed an easy-to-use and
scalable RGB-D capture system that includes automated surface reconstruction
and crowdsourced semantic annotation. We show that using this data helps
achieve state-of-the-art performance on several 3D scene understanding tasks,
including 3D object classification, semantic voxel labeling, and CAD model
retrieval. The dataset is freely available at http://www.scan-net.org